Skip to content

Conversation

valentinewallace
Copy link
Contributor

@valentinewallace valentinewallace commented Sep 2, 2025

Implements the always-online counterparty side of sending payments as an often-offline sender to an often-offline recipient.

Partially addresses #2298
Based on #4044

@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Sep 2, 2025

👋 Thanks for assigning @TheBlueMatt as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

if let Some(per_commitment_secret) = per_commitment_secret {
if self.holder_commitment_point.can_advance() {
let mut release_htlc_message_paths = Vec::new();
for htlc in &self.context.pending_inbound_htlcs {
// TODO: how to handle the errors here
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reviewers:

Would be good to discuss how to handle failure to create the path_for_release_held_htlc here.
This may be fine for v1 since the failures should basically never happen, but in theory we could end up in a situation where the HTLC gets close to expiring and the sender isn't online for the HTLC to be failed back to them, which would result in FC.

It's a little tricky because it seems nontrivial to add support for returning HTLCs-to-fail from this method.

One option would be to make path_for_release_held_htlc infallible by having it fall back to creating a 1-hop blinded path.

Another more robust but more complex-seeming option would be to attempt to create the blinded path when the corresponding update_add is received, and storing the resulting path in the pending HTLC for use when it comes time to generate the RAA. That would allow us to cache the HTLC's pending failure in the InboundHTLCResolution if we fail to create a blinded path for the RAA, though it would reintroduce a potential timing attack that we had eliminated during the refactor that added ChannelManager::decode_update_add_htlcs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

end up in a situation where the HTLC gets close to expiring and the sender isn't online for the HTLC to be failed back to them, which would result in FC.

It shouldn't the LSP (and LDK generally) won't FC a channel because of an inbound HTLC which we don't have the preimage or (ie its our counterparty's money). The client probably will FC here, but that's a long-standing issue (#4048).

One option would be to make path_for_release_held_htlc infallible by having it fall back to creating a 1-hop blinded path.

Seems fine to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Advantage of the 1-hop blinded path would probably also be that peers don't need to be cached in flow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Advantage of the 1-hop blinded path would probably also be that peers don't need to be cached in flow?

The approach discussed above means we'll fall back to using 1-hop if creating a multi-hop path using the cached peers fails, although falling back should basically never happen

@valentinewallace valentinewallace mentioned this pull request Sep 2, 2025
3 tasks
@valentinewallace valentinewallace requested review from joostjager and removed request for tankyleo September 2, 2025 15:35
Copy link

codecov bot commented Sep 2, 2025

Codecov Report

❌ Patch coverage is 46.18474% with 134 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.70%. Comparing base (654180d) to head (aeaa42d).

Files with missing lines Patch % Lines
lightning/src/ln/channelmanager.rs 29.37% 98 Missing and 3 partials ⚠️
lightning/src/offers/flow.rs 46.51% 23 Missing ⚠️
lightning/src/ln/channel.rs 79.54% 7 Missing and 2 partials ⚠️
lightning/src/util/ser_macros.rs 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4045      +/-   ##
==========================================
- Coverage   88.77%   88.70%   -0.07%     
==========================================
  Files         176      176              
  Lines      129375   129566     +191     
  Branches   129375   129566     +191     
==========================================
+ Hits       114856   114935      +79     
- Misses      11919    12020     +101     
- Partials     2600     2611      +11     
Flag Coverage Δ
fuzzing 22.03% <21.98%> (+0.01%) ⬆️
tests 88.54% <46.18%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bad change set for what it gets us. Two comments:

  • Are existing tests covering some or most of this code, or does that need to be added still?
  • I think I would have preferred the single hop blinded path option for simplicity.

// for blinded paths.
let peer = MessageForwardNode { node_id: peer_node_id, short_channel_id: None };
match peers_cache.len() {
num_peers if num_peers >= MAX_CACHED_PEERS => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be simplified by keeping all peers in memory? It can't be that many.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely could be simplified that way. My thought process was that this is intended for use by (up to) an acinq-sized node. Acinq has ~1,500 public channel peers and probably thousands(?) of tiny mobile peers. So cloning all those for every mobile client's async payment doesn't seem ideal. It's an extreme example but the code didn't seem too bad as-is, I agree this may be premature though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just remove it really. For an acinq-sized node, I don't think the memory usage matters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I guess you could also argue that optimizing here can wait for a larger refactor that gets rid of a lot more low hanging fruit clones of peer vecs in the flow that were discussed on #3639. Removed in a fixup.

if let Some(per_commitment_secret) = per_commitment_secret {
if self.holder_commitment_point.can_advance() {
let mut release_htlc_message_paths = Vec::new();
for htlc in &self.context.pending_inbound_htlcs {
// TODO: how to handle the errors here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Advantage of the 1-hop blinded path would probably also be that peers don't need to be cached in flow?

@ldk-reviews-bot
Copy link

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

@@ -3173,14 +3186,17 @@ impl_writeable_msg!(RevokeAndACK, {
channel_id,
per_commitment_secret,
next_per_commitment_point
}, {});
}, {
(0, release_htlc_message_paths, optional_vec)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we should use an experimental TLV for this until we get more spec feedback.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't really bother with that. It's just a convention, but does require a switch-over even when everything is good.

In upcoming commits, we'll be creating blinded paths during the process of
creating a revoke_and_ack message within the Channel struct. These paths will
be included in said RAA to be used as reply paths for often-offline senders
held_htlc_available messages.

Because we hold the per-peer lock corresponding to the Channel while creating
this RAA, we can't use our typical approach of calling
ChannelManager::get_peers_for_blinded_path to create these blinded paths.
The ::get_peers method takes each peer's lock in turn in order to check for
usable channels/onion message feature support, and it's not permitted to hold
multiple peer state locks at the same time due to the potential for deadlocks
(see the debug_sync module).

To avoid taking other peer state locks while holding a particular Channel's
peer state lock, here we cache the set of peers in the OffersMessageFlow, which
is the struct that ultimately creates the blinded paths for the RAA.
As part of supporting sending payments as an often-offline sender, the sender
needs to be able to set a flag in their update_add_htlc message indicating that
the HTLC should be held until receipt of a release_held_htlc onion message from
the often-offline payment recipient.

We don't yet ever set this flag, but lay the groundwork by including the field
in the update_add struct.

See-also <lightning/bolts#989>
@valentinewallace valentinewallace force-pushed the 2025-08-async-send-lsp branch 3 times, most recently from c487147 to b079c36 Compare September 4, 2025 21:23
As part of supporting sending payments as an often-offline sender, the
often-offline sender's channel counterparty needs to advertise a feature bit
indicating that they support holding onto the sender's HTLC until they receive
a release_held_htlc onion message from the recipient indicating that they are
online and ready to receive the payment.

See-also <lightning/bolts#989>

We don't yet advertise support of this feature, so here we also disconnect from
peers that try to send us hold HTLCs.
As part of supporting sending payments as an often-offline sender, the
often-offline sender's channel counterparty needs to advertise a feature bit
indicating that they support holding onto the sender's HTLC until they receive
a release_held_htlc onion message from the recipient indicating that they are
online and ready to receive the payment.

Here we add a config flag to turn on advertising this feature, and fail back
hold_htlcs if this config flag is not set.

See-also <lightning/bolts#989>
As part of supporting sending payments as an often-offline sender, the sender
needs to send held_htlc_available onion messages where the reply path
terminates at their always-online channel counterparty that is holding the HTLC
until the recipient comes online. That way when the recipient sends
release_held_htlc, the sender's counterparty will receive that message.

To accomplish this, the sender's always-online counterparty includes said reply
path in the revoke_and_ack message corresponding to the held HTLC. Here we add
support for this field, though we don't set it yet.

We also had to tweak the ser macros for this because impl_writeable_msg had
never had to write a Vec in a message TLV field before.
Makes the next commit cleaner.
As part of supporting sending payments as an often-offline sender, the sender's
always-online channel counterparty needs to hold onto the sender's HTLC until
they receive a release_held_htlc onion message from the often-offline
recipient.

Here we implement storing these held HTLCs in the existing
ChannelManager::pending_intercepted_htlcs map.

We want to move in the direction of obviating the need to persistence the
ChannelManager entirely, so it doesn't really make sense to add a whole new map
for these HTLCs.
As part of supporting sending payments as an often-offline sender, the sender
needs to send held_htlc_available onion messages such that the reply path to
the message terminates at their always-online channel counterparty that is
holding the HTLC. That way when the recipient responds with release_held_htlc,
the sender's counterparty will receive that message.

Here we add a method for creating said reply path, which will be used in the
next commit.
As part of supporting sending payments as an often-offline sender, the sender
needs to send held_htlc_available onion messages such that the reply path to
the message terminates at their always-online channel counterparty that is
holding the HTLC. That way when the recipient responds with release_held_htlc,
the sender's counterparty will receive that message.

Here the counterparty starts including said reply paths in the revoke_and_ack
message destined for the sender, so the sender can use these paths in
subsequent held_htlc_available messages.

We put the paths in the RAA to ensure the sender receives the blinded paths,
because failure to deliver the paths means the HTLC will timeout/fail.
As part of supporting sending payments as an often-offline sender, the sender's
always-online channel counterparty needs to hold onto the sender's HTLC until
they receive a release_held_htlc onion message from the often-offline
recipient.

Here we implement forwarding these held HTLCs upon receipt of the release
message from the recipient.
Comment on lines +6409 to +6433
// Nodes shouldn't expect us to hold HTLCs for them if we don't advertise htlc_hold feature
// support.
//
// If we wanted to pretend to be a node that didn't understand the feature at all here, the
// correct behavior would've been to disconnect the sender when we first received the
// update_add message. However, this would make the `UserConfig::enable_htlc_hold` option
// unsafe -- if our node switched the config option from on to off just after the sender
// enqueued their update_add + CS, the sender would continue retransmitting those messages
// and we would keep disconnecting them until the HTLC timed out.
if update_add_htlc.hold_htlc.is_some()
&& !BaseMessageHandler::provided_node_features(self).supports_htlc_hold()
{
let reason = LocalHTLCFailureReason::TemporaryNodeFailure;
let htlc_fail = self.htlc_failure_from_update_add_err(
&update_add_htlc,
&incoming_counterparty_node_id,
reason,
is_intro_node_blinded_forward,
&shared_secret,
);
let failure_type =
get_htlc_failure_type(outgoing_scid_opt, update_add_htlc.payment_hash);
htlc_fails.push((htlc_fail, failure_type, reason.into()));
continue;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeking conceptual feedback on this approach, including the failure reason

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it isn't better and simpler to just let the htlc through and log? Maybe the receiver is online or can be woken up and it just works. And otherwise that error will come back anyway.

It probably also depends on how the sender is going to interpret this failure. If it handles it like any other, it isn't necessary to be specific? Zooming out, I don't think any of this matters much in a client<->lsp setup, except for some debugging with can be done through logs too.

@@ -12983,6 +12983,8 @@ where
for (err, counterparty_node_id) in failed_channels.drain(..) {
let _ = handle_error!(self, err, counterparty_node_id);
}

let _ = self.flow.peer_disconnected(counterparty_node_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log error? Or otherwise don't return anything from peer_disconnected as this is the only usage?

@@ -3268,7 +3273,8 @@ impl_writeable_msg!(UpdateAddHTLC, {
onion_routing_packet,
}, {
(0, blinding_point, option),
(65537, skimmed_fee_msat, option)
(2, hold_htlc, option),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking: the way this is set up, does this allow for future tlv extension of the hold_htlc field in case we wanted to communicate more info to the sender lsp about the hold/release process?

Comment on lines +6409 to +6433
// Nodes shouldn't expect us to hold HTLCs for them if we don't advertise htlc_hold feature
// support.
//
// If we wanted to pretend to be a node that didn't understand the feature at all here, the
// correct behavior would've been to disconnect the sender when we first received the
// update_add message. However, this would make the `UserConfig::enable_htlc_hold` option
// unsafe -- if our node switched the config option from on to off just after the sender
// enqueued their update_add + CS, the sender would continue retransmitting those messages
// and we would keep disconnecting them until the HTLC timed out.
if update_add_htlc.hold_htlc.is_some()
&& !BaseMessageHandler::provided_node_features(self).supports_htlc_hold()
{
let reason = LocalHTLCFailureReason::TemporaryNodeFailure;
let htlc_fail = self.htlc_failure_from_update_add_err(
&update_add_htlc,
&incoming_counterparty_node_id,
reason,
is_intro_node_blinded_forward,
&shared_secret,
);
let failure_type =
get_htlc_failure_type(outgoing_scid_opt, update_add_htlc.payment_hash);
htlc_fails.push((htlc_fail, failure_type, reason.into()));
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it isn't better and simpler to just let the htlc through and log? Maybe the receiver is online or can be woken up and it just works. And otherwise that error will come back anyway.

It probably also depends on how the sender is going to interpret this failure. If it handles it like any other, it isn't necessary to be specific? Zooming out, I don't think any of this matters much in a client<->lsp setup, except for some debugging with can be done through logs too.

@@ -10732,6 +10800,22 @@ This indicates a bug inside LDK. Please report this error at https://github.com/
fail_intercepted_htlc();
},
}
} else if pending_add.forward_info.routing.should_hold_htlc() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated code. I know that Rust isn't very inviting towards reusing code...

@joostjager
Copy link
Contributor

joostjager commented Sep 5, 2025

This PR looks good to me, but I think it would be good to have a passing unit test in the follow up and/or an ldk-node test before merging this one.

@elnosh
Copy link
Contributor

elnosh commented Sep 5, 2025

we could end up in a situation where the HTLC gets close to expiring and the sender isn't online for the HTLC to be failed back to them, which would result in FC.

From the above #4045 (comment), could this also happen if the LSP is holding an htlc from an async sender but the receiver didn't send a release_held_htlc message back in time and now sender isn't online for the LSP to fail it back.

Although if so, I guess it should be solved by #4048

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants